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ABSTRACT 

Gauss  suggested  that,  when  the  model  is  a  nonlinear  function  of 
parameters,  least  squares  parameter  estimates  might  be  obtained  by  iterative 
linearization.  To  prevent  difficulties  in  convergence,  Levenberg,  and  later 
Marquardt,  proposed  a  constrained  minimization  procedure  using  a  scale- 
invariant  metric  for  the  parameters.  If,  as  seems  sensible,  the  minimization 
is  conducted  in  a  metric  which  is  also  linearly  invariant  then  the  Levenberg- 
Marquardt  method  is  equivalent  to  a  simple  modification  of  the  Gauss  iteration 
proposed  earlier.  Methods  for  deciding,  at  each  stage,  how  far  to  move  along 
the  Gauss  solution  vector  are  introduced  which  make  economic  use  of  the 
available  information. 
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SIGNIFICANCE  AND  EXPLANATION 


The  method  of  least  squares  is  widely  used  for  the  fitting  to  data  of 
functions  containing  unknown  parameters.  When  these  functions  are  non-linear 
in  the  parameters  an  iterative  approach  using  successive  local  linearizations 

was  suggested  by  Gauss.  To  encourage  convergence  various  modifications  and 

s'' 

alternatives  have  been  suggested.  In  particular  (a)  J.t  has  been  proposed 
that,  at  the  early  stages,  use  of  the  method  of  steepest  descent  might  speed 
convergence  (b)  £  method  might  be  used  in  which  changes  were  made  in  the 
direction  of,  but  not  of  the  magnitude  of,  the  Gauss  adjustment.  It  is 
convenient  to  call  this  the  Gauss, direction  method.  Levenberg  and  later 

Marquardt  proposed 'a  spherically  constrained  procedure.  The  procedure  could 
be  made  scale-invariant  to  render  its  behavior  independent  of  the  (arbitrary) 
units  in  which  the  parameters  were  measured.  It  was  argued  the  resulting 
method  offered  an  appropriate  compromise  between  steepest  descent  and  Gauss's 
method. 

In  this  report  some  of  the  geometry  of  these  methods  is  discussed.  As  a 
consequence  it  is  argued  that  the  procedure  should  also  be  made  invariant 

r~ n 

under  linear  transformation.  With  this  modification  (a)  the  scale-invariant 
Levenberg-Marquardt  method  becomes  inappropr iate  "(bT  the  steepest  descent 
direction  and  the  Gauss  direction  become  identical.  It  remains,  therefore,  to 
determine  how  far  along  the  Gauss  direction  one  should  proceed.  Two 
alternative  methods  for  deciding  this  are  suggested.  Both  methods  use 
quantities  already  available  from  previous  calculations. 


The  responsibility  for  the  wording  an  I  views  expressed  in  this  descriptive 
summary  lies  with  MFC,  and  not  with  the  authors  of  this  report. 


CONSTRAINED  NONLINEAR  LEAST  SQUARES 


•  it 

George  E.  P.  Box  and  Hiroaitsu  Kanemasu 

1.  Nonlinear  Leant  Squares 

Suppose  an  observation  yu  is  described  in  the  model 

yu  -  f(Eu.8»  +  eu.  u  -  1,2,... ,n  (1.1 

where  ?  ■  ( C .  ,5.  )'  are  the  levels  of  k  independent  variables, 

~u  elu  -2u  *ku 

6  -  )'  are  p  unknown  parameters,  f(C  ,6)  is  a  known  function  of  € 

•  12  P  .U  .  «U 

and  6,  and  is  a  random  error.  The  method  of  least  squares  obtains  an  estimate  0 

of  the  parameters  which  minimizes  the  sum  of  squares 

"  2 

S(8)  -  I  (yu  -  f(Cu,e>>  ,  (1.2 

u-1  "  * 

or  in  vector  notation 

S(8>  -  (y  -  fg) 1 (y  -  fe)  ,  (1.3 

where  y  is  the  n  *  1  vector  of  yu#  u  *  1,2,...,n  and  f^  is  the  n  x  1  vector 
whose  uth  element  is  f(5u<8).  Under  the  assumption  that  the  errors  are  independently 

A 

and  normally  distributed  with  equal  variance,  8  is  the  maximum  likelihood  estimator  of 
9. 

The  function  f(Cu«8)  is  said  to  be  linear  in  the  parameters  if  the  derivatives 
3f (Cu,9)/38j  are  independent  of  8  for  all  j.  Otherwise,  it  is  called  nonlinear. 
When  the  model  is  linear,  the  least  squares  estimates  8  are  simply  the  solution  of  the 
normal  equation  X'X8  «  x'y  where  X  is  the  n  *  p  matrix  of  derivatives 
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3f(C  ,0)/38^.  If  the  nodal  is  nonlinear  Gauss  [11)  suggested  that  0  could  be  obtained 
by  linearizing  f(5u»8)  around  the  current  estimates  8*°*  in  successive  iterations* 


2.  Gauss  Method  end  Overshooting 

A  local  approximation  to  the  nonlinear  function  f(5u,8)  may  be  obtained  by  perform¬ 
ing  a  Taylor  series  expansion  at  some  guessed  values  8  “  and  truncating  after 

first  order  terms  to  obtain 


f(5  ,9)  a  f(C  ,8  )  + 

-u  -  -u  - 


p  n>f<e  ,8)1  .  . 

?  hr"  <#i-6l  »  “-1'2 . - 

1-1  L  1 


In  vector  notation 


fa  a  f  +  X  (8  -  0'°')  ,  (2.2) 

where  f  is  the  n  x  1  vector  of  f(S  ,8^°^)j  u  -  1,2, ...,n  and  X  i«  the  n  x  p 

_o  .u  .  .o 

matrix  whose  (u,j)  element  is  [3 f (C  ,0)/38  I  ,  .  .  This  local  linearization  gives  the 

*u  -  3  8-8'  ' 

approximate  sum  of  squares 

S<0)  -  (y  -  f  -  X  (0  -  0<O>  >  >  *  <y  -  f  -  X  (8  -  8(o>)> 

,  a  mO  mO  .  •  -O  mO  «  . 

-  e’e  -  2e  *X  (8  -  0to))+  (8  -  0{o>)’X'X  (8  -  0(o>>  (2.3) 

■  OaO  *OeO  .  •  •  «OaO  .  a 

where  e  *  y  -  f  .  Setting  the  derivatives  3S(8)/38  to  zero  gives  the  normal  equations 

.0.-0  -  j 

X’X  (0  -  8(o>  >  -  X'e  .  (2.4) 

•0.0  •  •  .0.0 

Thui  provided  that  *q*0  is  nonsingular,  new  estimates  8 * 1 ^  of  the  parameters  are  given 
by 

8(1>  -  6to)  .  (X’X  >"Ve  .  (2.5) 

.  .  .0.0  0.0 

By  linearizing  now  about  8 ^ 1  * ,  a  second  approximation  8^2*  may  now  be  obtained,  and  so 
on.  Figure  1  shows  the  parameter  space  representation  of  the  first  stage  of  this 
procedure  for  a  two  parameter  model.  Contours  for  true  and  approximate  sums  of  squares 
S(8)  and  S(8)  are  shown.  Contours  of  S(6)  are  necessarily  elliptical  while  those 
for  S(0)  have  the  appearance  of  distorted  ellipses. 


In  favorable  circumstances  the  iteration  will  converge  to  the  least  squares  estimates 


but  in  some  examples  wild  oscillations  can  occur  from  one  iteration  to  another  and  the 
process  may  not  converge.  Figure  1  shows  a  typical  case  of  initial  divergence,  where  the 
Gauss  solution  vector  (2.5)  "overshoots"  and  the  true  sum  of  squares  given  by  the  new 
estimates  8^  is  larger  than  that  given  by  the  initial  estimates  A  frequent 

cause  of  divergence  is  that  the  adjustment  8^  -  8^o)  is  too  large  and  so  invalidates 
the  linear  approximation  (2.1). 


3.  Steepest  Descent  Approach  and  Application  of  Response  Surface  Method 

Another  iterative  approach  to  nonlinear  least  squares  uses  steepest  descent  where 

.(o) 

initially  the  iteration  moves  from  the  current  point  0  xn  the  direction  of  steepest 

3S<0)  3S(0 )  '  *  (o) 

■gg~— )  evaluated  at  8  -  8  .  This  procedure  is 

p 

often  effective  in  getting  near  the  region  of  a  minimum  quickly.  However,  it  may  have  an 
extremely  slow  rate  of  convergence  after  that,  particularly  in  the  common  case  of  a  ridgy 
minimum. 

Box  and  coutie  [6]  proposed  a  method  based  on  response  surface  methodology  which 
employed  steepest  descent  at  the  early  stages  [2],  [9).  If  the  initial  point  is  remote 
from  the  minimum,  the  sum  of  squares  surface  S(8)  might  be  capable  of  local 
approximation  by  a  polynomial  in  8  of  the  first  degree.  The  sum  of  squares  is  therefore 
determined  at  a  series  of  points  in  the  parameter  space  from  which  a  planar  approximation 
may  be  fitted  and  the  direction  of  steepest  descent  calculated.  This  direction  is 
followed  until  an  increase  in  the  sum  of  squares  is  encountered.  The  whole  process  is 
repeated  until  the  need  for  a  second  degree  approximation  becomes  manifest. 

Suras  of  squares  are  then  computed  at  a  set  of  points  arranged  to  allow  a  second 
degree  polynomial  to  be  locally  fitted,  and  an  approximate  minimum  determined.  This  method 
is  (see  also  Koshal  [13])  inefficient  because  it  makes  use  of  only  the  information 
contained  in  the  sum  of  squares  of  residuals  hut  not  of  that  in  the  individual 


descent  vector,  that  is  -(■ 


38. 
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residuals  themselves*  It  was  shown  in  [3]  that  when  this  missing  information  is  included 


we  are  brought  back  to  the  Gauss  method. 

4.  Modified  Gauss  Iteration 

One  way  to  overcome  the  difficulty  of  overshooting  in  the  Gauss  iteration  is  to 
determine  direction  only  (but  not  distance)  by  the  Gauss  solution  vector 
0^9)  _  0^°)  »  (X*X  )  ^'e  .  Thus,  the  adjustment  vector  6  -  8*°^  is  given  by 

•  e  .0.0  0.0  _  . 

8  -  8(o)  -  v(X'X  )"1X,e  (4.1 > 

.  .  .0.0  .0.0 

where  v  is  a  positive  quantity  usually  less  than  unity.  This  modified  Gauss  iteration 
was  suggested  by  Box  [4]  and  incorporated  into  a  computer  program  desc-ibed  by  Booth,  Box, 
Muller  and  Peterson  [1].  In  order  to  determine  the  value  of  v  that  approximately 
minimizes  the  sum  of  squares  along  the  Gauss  solution  vector,  they  used  the  ’halving  and 
doubling*  method  in  which,  starting  from  v  “  1,  the  value  of  v  is  successively  halved 
(or  doubled)  until  the  sum  of  squares  finally  starts  to  increase  and  then  a  quadratic 
curve  is  fitted  to  the  last  three  points  to  locate  an  approximate  local  minimum.  Hartley 
(12]  later  proved  that,  under  a  set  of  mild  regularity  conditions,  the  modified  Gauss 
iteration  as  described  above  converges  to  the  solution  of  3S(8)/38 ^  >0;  j  *  1,2,...,p 
and  also  proposed  a  similar  method  to  determine  the  value  of  v. 


5.  Levenberg  Marquardt's  Constrained  Iteration 
5.1.  Levenberg1 s  Damped  Learnt  Squares 

Levenberg  [14]  tried  to  overcome  the  difficulty  of  "overshooting"  in  the  Gauss 
iteration  by  introducing  a  constraint  into  the  minimization  of  the  sum  of  squares. 

Instead  of  minimizing  the  approximate  sum  of  squares  S(8)  itself  he  proposed  to  minimize 


Pie) 


bf8)  +  \ 


j.  Vei 


e(o))2 

X 


(5.1) 


where  Its  ,lu  ,  ...,lui 

1  2  p 


are  positive  weighting  factors  expressing  the  relative  importance 
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of  damping  the  different  increments.  Substituting  (2.3)  into  (5.1), 

F{0)  -  e'e  -  2e  'x  (6  -  8(o>>  +  (6  -  0(o>)'X*X  (8  -  8(o>) 

-  .0.0  .oo.«  ..  oo.„ 

+  X(0  -  8(o))'Q(8  -  0<O))  ,  (5.2) 

where  ft  is  a  p  *  p  diagonal  matrix  whose  ith  diagonal  element  is  u>^.  Setting  the 

derivatives  3P(8)/30.  to  zero 
-  x 

-X'E  X’X  (0  -  8(o>)  +  Xft(8  -  0(o))  -  0  .  (5.3) 

0.0  oo..  .. 

Solving  for  0,  we  have 

0  -  0(o)  .  (x 'X  +  Xft)-1X,c  .  (5.4) 

.  .  .0.0  .  .0.0 

Geometrically,  in  the  parameter  space  representation,  this  amounts  to  minimising  the 
approximate  sum  of  squares  S(6)  on  the  elliptical  constraint  whose  principal  axes  are 
parallel  to  the  axes  of  0,,0.,...,0  .  This  is  illustrated  in  Figure  2.  The  dotted  curve 

12  p 

represents  the  locus  of  the  solution  of  (5.4)  for  various  values  of  X.  Levenberg  proved 

that,  provided  the  true  bus  of  squares  S(0)  does  not  already  have  a  stationary  value  at 

the  current  estimate  8*  °* ,  the  sum  of  squares  initially  decreases  as  we  move  off  the 

point  0<O> .  He  recommended  the  spherical  constraint  ft  »  1^,  where  Ip  is  the  p  *  p 

identity  matrix.  An  alternative  choice  for  ft  was  a  diagonal  matrix  D  whose  jth 

diagonal  element  was  the  jth  diagonal  element  [jj]  of  the  matrix  X'X  . 

.o  .o 

5.2.  Marquardt 'a  Algorithm 

Marquardt  (15],  also  considered  the  constrained  minimisation.  Re  examined 
geometrical  aspects  of  the  case  ft  -  1^  in  Levenberg' s  formulation.  He  noted  a  result  of 
Morrison  [17]  that  the  radius  of  the  constraining  sphere  is  a  monotone  decreasing  function 
of  X  that  tends  to  zero  as  X  ♦  «,  and  then  proved  that  as  X  ♦  -  the  solution  vector 
of  the  constrained  minimization  rotates  toward  the  steepest  descent  vector  that  remains 
fixed.  Since  the  Gauss  solution  is  of  course  a  special  case  of  the  constrained 
minimization  when  X  ■  0,  a  choice  of  some  intermediate  value  for  X  would  provide  a 
compromise  between  the  two  classical  methods.  The  algorithm  proposed  by  Marquardt,  which 


may  be  termed  (X,v)  algorithm,  reduces  the  level  of  X  by  a  factor  v  from  a 

relatively  large  initial  value  X  as  the  iteration  proceeds.  This  amounts  to  making  use 

o 

of  the  steepest  descent-like  procedure  early  in  the  iterative  process  and  then  later 
shifting  to  a  solution  that  approximates  the  Gauss  solution.  Marquardt's  method  thus 
appears  to  possess  the  virtue  of  using  each  classical  procedure  in  the  circumstances  where 
it  is  most  effective. 

Noting  that  the  steepest  descent  procedure  is  dependent  on  the  choice  of  parameter 


scales,  Marquardt  proposed  the  constrained  minimization  in  the  scale  invariant  metric, 

*  Vo 

i.e.,  the  minimization  with  a  spherical  constraint  in  6^  “  [jjl  *0^  (j  «  1,2,...,p>. 
This  is  equivalent  to  using  the  constraint  D  *  D  of  Levenberg.  Marquardt's 


recommendations  were  followed  by  Meeter  (16]  in  the  program  GAUSHAUS  at  the  University  of 


Wisconsin  and  similar  programs  have  been  used  elsewhere. 


5.3.  Constrained  Minimisation  in  the  Transfoteed  Space 

The  Levenberg-Marquardt  procedure  may  be  criticized  as  follows.  The  use  of  constant 

spherical  or  elliptical  constraints  implies  that  it  is  logical  to  solve  the  problem  in 

some  parameterization  decided  in  advance  by  the  investigator.  However,  usually  there  are 

many  ways  in  tdiich  a  problem  might  be  parameterized.  Instead  of  considering 

0,,0,,...,0  ,  we  might  with  equal  reason  consider  ♦  ,  ♦  where  “  ♦,(6)  is 

1  *  P  izp  J  J  - 

some  1:1  transformation  of  0.  Clearly,  the  nature  of  the  constraints  applied  would 

differ  depending  on  which  parameterization  was  adopted. 

Let  us  then  consider  the  problem  in  a  parameter  metric  that  is  not  only  scale 

invariant  but  also  invariant  under  linear  transformation.  Such  parameter  metric  t  is 

provided  by  <l>  «  H0  where  H  is  any  p  *  p  nonsingular  matrix  that  satisfies 

H'H  -  X'X  .  (5.5) 

.  -  .0.0 

To  see  this,  make  an  arbitrary  linear  transformation  0  ”  L0.  Correspondingly  the 
derivative  matrix  Xg  will  be  transformed  to  Xc  “  XoL-1.  The  transformation  of  0 
corresponding  to  0  +  ♦  will  be  i  “  H0  with  H'H  ■“  X'X  . 


However,  the  requirement  on 


since 


# 


H  will  be  satisfied  by  H 


HL 


H'H 


(HL_1 )* (HL*1 ) 


.  -1 


X '  XL 
o  O 


-1 


X*X_ 
o  o 


(5.6) 


Therefore 

i  =■  ii©  =  (hl_1)(l8)  =  h6  -  ,  (5.7) 

establishing  that  is  invariant  under  linear  transformation. 

In  the  new  space  1/  the  linearized  model  will  be 

f.  =!  f  +  Z  (*  -  *(o))  (5.8) 

-o  o  . 

where  f^  is  the  n  *  1  vector  of  f (5^,8(111) )»  u  «  1,2,  ...,n,  ■  H8^°\  and  ZQ  is 

the  n  *  p  matrix  whose  (u,j)  element  is  (3f(?  ,8 (<|>)  )/3<>,  ]  ,  .  .  Notice  that  Z  is 

-u  -  -  J  ^(o) 

related  to  X  by  Z  »  X_H-1.  Therefore,  the  sum  of  squares 'contours  for  the  linearized 
o  o  o 

model  given  by 

(y  -  f  -  z  (♦  -  t*°*))*(y  -  f  —  Z  ( ♦  —  ^°*)>  “  constant  (5.9) 

t  .O  o  w  «.  „o  o  . 

will  be  spherical  because 

%Zo  ~  <XoH"1>,(X0H-1>  -  H^’X^H-1  -  Ip  .  (5.10) 

The  representation  of  the  problem  would  now  be  that  illustrated  for  p  •  2  in  Figure  3  in 
which  the  contours  for  the  linearized  model  are  spheres.  To  the  extent  that  the  contours 

for  the  linearized  model  were  like  those  for  the  true  model,  this  would  ensure  that  the 

contours  of  the  true  model  were  roughly  spherical,  enabling  us  to  circumvent  the 
difficulties  arising  in  situations  of  ridgy  minimum. 

If,  in  the  linearly  invariant  metric,  we  apply  appropriate  spherical  constraints  we 
must  minimize 

FOP)  -  (c  -  z  dp  -  *(o> )]  ’  te  -  z  (*  -  <p(o))] 

•  .o  O  «  •  -O  o  •  » 

+  X[<*  -  *(0>>M*  -  <P<0>>  -  Y2]  ,  (5.11) 

where  e  =  y  -  f  and  Y  is  the  radius  of  the  sphere.  Setting  the  derivatives 
•  O  .  ,o 

3F(<(i)/3!)<i  (i  “  1,...,p)  to  zero, 

-2z'e  +  2Z'Z  Op  -  ip(o>>  +  2X(*  -  ip(o)>  -  0  .  (5.12) 

0.0  o  o  „  - 

Solving  for  ♦  we  obtain 

i(i  -  ipto>  =  (s'Z  +  XI  )-1Z’e  .  (5.13) 

w  «  „o.O  «,p  *0.0 


►  • 

m:. 
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Constraints 


Figure  3.  Constrained  minimization  of  the  sum  of  squares  in  the 
linearly  invariant  metric. 


Because  of  (5.10),  this  is 


— — r  Z*e  . 

1  +  X  .0.0 


(5.14) 


It  can  be  shown  that,  as  long  as  (-A)  is  chosen  to  be  less  than  the  minimum  eigenvalue 

of  Z^Zo  (that  is,  less  than  unity),  the  solution  (5.13)  provides  the  point  that 

minimises  in  the  space  ♦  the  linearized  model  sum  of  squares  on  the  sphere  of  radius 
2 

(1/1  +  A)  e'z  Z'e  .  (See  Draper  [10]  for  the  analysis  of  stationary  points  of  the  general 
_o  o  o  .o 

quadratic  response  surface  on  spherical  constraints.)  We  also  note  that  this  solution  is 

the  vector  of  steepest  descent  (except  for  the  irrelevant  scalar)  since  by  differentiating 

the  negative  of  the  true  sum  of  squares  S(4>)  “  (y  -  f^)'(y  -  f^>  with  ♦  and  evaluating 

the  derivative  at  ♦  -  ° *  we  obtain  2Z'e  .  Now  transforming  (5.14)  back  to  the 

,  «_  .0.0 


original  metric  6 , 


which  gives 


H(6  -  8(o)) 


T-hr  ' 


r  H-1(X  H-1  )*e  “  (H'H)~'x*e 

1  ♦  X  O  .O  1  ♦  A  0.0 


Because  of  (5.5),  we  finally  obtain 


r  (x*x  )",X‘e  . 

1  ♦  A  o  o  ojo 


(5.15) 


(5.16) 


(5.17) 


This  is  the  same  as  (4.1),  where  the  requirement  (-A)  <  1  implies  that 

v  «  1/(1  +  A)  can  range  between  0  and  ~ .  Somewhat  surprisingly  then  the  Levenbarg- 

Marquardt  constrained  minimization  performed  in  the  linearly  invariant  metric  is  exactly 

the  modified  Gauss  method  discussed  earlier.  It  should  be  noted  also  that  in  the  linearly 

invariant  metric  there  is  no  question  of  a  compromise  between  the  two  classical  methods. 

Because,  using  (5.10),  the  Gauss  vector  (Z'Z  )  'z'e  is  equal  to  Z'e  which  is  the 

.0.0  .0.0  .0.0 

steepest  descent  vector.  (In  Figure  3,  the  path  that  is  the  Gauss  vector  as  well  as  the 
steepest  descent  vector  is  shown  by  the  dotted  straight  line.  The  path  which  the 
Levenberg-Marquardt  procedure  would  take  is  shown  by  the  bold  connected  curve.) 

So  far  as  the  problem  of  speedy  convergence  is  concerned,  there  does  not  therefore 
seem  to  exist  much  concrete  theoretical  basis  for  interpolation  between  the  two  classical 
methods.  One  incidental  advantage  of  the  Levenberg-Marquardt  procedure  may  be  that  the 
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matrix  (X^X^  +  can  be  inverted  even  when  the  matrix  X^XQ  is  singular  or  nearly 

singular,  thus  always  giving  a  "solution".  Practical  experience,  however,  leads  us  to 
believe  that  the  possibility  of  not  having  a  singularity  or  near-singularity  brought  to 
one’s  attention  is  a  disadvantage  rather  than  an  advantage.  It  has  often  been  pointed  out 
(for  example  (2])  that  a  minimum  is  often  better  approximated  by  a  line,  plane,  or 
hyperplane  than  by  a  point.  When  this  happens,  it  is  important  that  it  be  brought  to  the 
investigator's  notice.  One  method  for  ensuring  this  is  by  means  of  canonical  analysis 
suggested,  for  example,  in  [5). 

6.  Methods  to  Determine  Bow  Par  One  Should  Go  Along  the  Gauss  Solution  Vector 

In  the  preceding  section,  we  have  given  theoretical  support  to  the  idea  that  we 
should  explore  the  Gauss  vector  itself  rather  than  the  path  followed  by  the  Levenberg- 
Harquardt  procedure.  We  have  done  this  by  demonstrating  that  the  Gauss  vector  may  be 
arrived  at  by  applying  Levenberg-Marquardt  constrained  minimization  in  the  linearly 
invariant  parameter  metric. 

We  recall  that  the  direction  of  the  Gauss  vector  in  the  original  parameter  metric 
corresponds  to  that  of  the  steepest  descent  considered  in  the  linearly  invariant  metric. 
Also  applying  the  Levenberg's  result  mentioned  in  Section  5.1  to  the  constrained 
minimization  in  the  linearly  invariant  metric,  we  can  be  assured  that  the  true  sum  of 
squares  initially  decreases  as  we  start  off  from  the  current  estimate  0*°'  on  the  Gauss 
vector  provided  8*°*  is  not  already  the  stationary  point. 

The  question  of  how  far  one  should  go  along  the  Gauss  vector  still  remains.  Hie 
"halving  and  doubling"  method  already  mentioned  could  be  used.  But  its  disadvantage  xs 
that,  after  the  Gauss  vector  is  determined  for  9*°*,  the  function  f(Cu»6>  must  be 
additionally  evaluated  for  u  “  1,2,...,n  to  calculate  S(8)  at  each  new  "test  point"  in 
the  parameter  space.  We  also  could  apply  the  (X,v)  algorithm  of  Marquardt  to  (5.17). 
Although  no  compromise  between  the  original  Gauss  method  and  the  steepest  descent  method 
is  here  involved,  it  may  still  make  sense  to  gradually  decrease  1  so  as  to  constrain  the 
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iteration  less  and  lass  aa  the  minimum  is  approached  in  successive  iterations.  Again, 

however,  there  is  no  way  to  know  in  advance  how  best  to  choose  and  v.  Below  we 

present  two  methods  to  determine  the  value  of  v  in  (4.1)  at  each  iteration  in  such  a  way 

as  to  make  use  of  only  the  information  already  available  from  the  preceding  computation. 

Modification  A.  To  obtain  the  Gauss  solution  vector  it  is  necessary  to  compute  the 

matrix  X  of  the  partial  derivatives  (3f (C  ,9)/30  ]  .  Using  this  matrix,  it  is 

-u  -  J  e-9l°' 

clearly  possible  to  obtain  the  initial  rate  of  change  of  *the  true  sum  of  squares  along  the 
Gauss  solution  vector.  In  fact. 


(6.1) 


is  the 


and 


6  -  e(o>  -  v(X*X  )"Ve  ,  we  obtain 
.  .  00  0.0 


-2c >x  (x’X  ) 
.0  000 


x  e 
0.0 


-2(0 


(g) 


0<o)) 


x’e 

0.0 


(6.2) 


Incidentally,  equation  (6.2)  gives  a  direct  proof  of  a  corollary  of  Levenberg' a 

result  mentioned  above  that  the  true  sum  of  squares  initially  decreases  as  we  start  moving 

from  6(o)  to  0^,  because  provided  that  X^XQ  is  positive  definite  Pjj^l  will  be 

-  -  v»0 


negative  except  when  Xvt 
o*o 


-  s  lif-U  11  !• 


To  locate  a  point  along  the  Gauss  solution  vector  at  which  the  true  sum  of  squares  is 


approximately  minimised,  we  first  suppose  that  the  true  sum  of  squares  follows,  along  this 


vector,  a  quadratic  function  S  »  a  ♦  bv  +  cv  .  He  can  determine  the  constants  a,  b 


and  c  by  setting  S  •  S_  for  v  ■  0,  S 


S  for  v  *  1  and  setting 


to  that 
the 


given  by  (6.2).  Consequently,  provided  c  ■  S„  -  S_  ♦  2(6 


value  of  v  for  which  the  true  sub  of  squares  is  approximately  minimized  is  given  by 


(8<9>  -  8<o))'XUo 

*in  s  -  s  ♦  2<e(g>  -  e(0)),x,e 

go  as  0*0 

Thus,  we  may  take  our  next  estimate  as  8*1*  ■  If  c  <  0,  we 

have  that  the  actual  sum  of  squares  at  distance  v  -  1  is  already  smaller  than  that 

predicted  by  the  initial  slope  since  in  this  case  <  SQ  +  [dS/dv]  Therefore  we  may 

settle  at  v  ■  1,  or  double,  or  redouble  the  distance,  checking  at  each  point  to  see  if 
the  decreasing  trend  is  continuing.  Figure  4  illustrates  various  situations  that  could 


occur. 

Modification  B.  A  procedure  developed  following  a  suggestion  by  Jack  Draffen*  is 
another  method  that  makes  use  of  the  existing  information  in  an  interesting  manner.  The 
quantities  So  and  are  obtained  by  computing,  squaring,  and  summing  the  elements  of 

the  two  residual  vectors 

Vo  ”  <y1  '  f,?1'!<0))'y2  "  f(!2'!<0>) . yn  ‘  f,5n'?<0))) 

and 

e*  -  <Yl  -  f<C1,8<q*),y2  -  >,*•., y„  -  •  (6.4) 

Consider  a  point  along  the  Gauss  solution  vector  with  the  distance  v!8*9*  -  0*o>l  away 
(o> 


from  the  origin  8 
point  by 

which  may  be  written 


By  linear  interpolation  we  can  estimate  the  residuals  at  this 


(1  -  v>e  ♦  vc 

-o  -g 


(6.5) 


c  -  v(e  -  e  )  +  e  .  (6.6) 

.o  . o  .g 

Thus  the  value  v  of  v  for  which  the  sum  of  squares  of  the  estimated  residuals  will  be 
as  small  as  possible  can  be  obtained  by  regressing  on  ~  so  that 


1 


Personal  communication  (1972). 


SUM  OF  SQUARES 


<fo 


(e  -  e  )'e 
_^0 - r9  -°. 


e  )(e  -  e  ) 

.9  .0  .g 


(6.7) 


whence  our  estimate  of  parameters  is  obtained  by  8*1*  “  ♦  v(8*g*  -  0^°S.  Again, 

*  2 
computing  v  is  vary  simple  and  makes  use  of  information  already  available. 

In  the  sample  space,  modification  B  corresponds  to  dropping  a  perpendicular  line  from 

y  to  f  -  f  whose  foot  gives  the  vector  (1  -  v)f  +  vf  .  This  is  illustrated  in 
i  _g  _o  .0  .g 

Figure  5  for  a  one  parameter  model.  The  relationship  of  this  procedure  to  modification  A 
can  be  seen  as  follows.  The  sum  of  squares  of  the  estimated  residuals  can  be  written  as 
e'e  -  (y  -  f  -  v( f  -  f  >)'(y  -  f  -  v(f  -  f  >>  (6.8) 

•  »  *  -  O  «9  . 0  m  .O  m  o 

where  f  is  the  n  *  1  vector  of  f(?  ,8*g*),  u  -  1,2,...,n.  This  is  quadratic  in  v 
-g  - 

and  passing  through  the  points  ( 0 , SQ )  and  (1,8^).  Furthermore,  the  initial  slope  of 
the  sum  of  squares  of  the  estimated  residuals  is  given  by 


-2(f  -  f  >'(y  -  f  )  -  -2(f 
•  9  .0  w  .0 


t  )‘t 
.0  .0 


(6.9) 


which  is  identical  to  the  initial  slope  used  in  the  previous  method  provided  that 

f  -  f  +  X  (8<g>  -  0<o>). 

.g  .0  o  .  . 


7.  ■sample 

In  comparing  various  methods  little  in  the  way  of  general  conclusions  can  be  based  on 
the  performance  of  particular  examples.  The  following  simple  example,  taken  from  Box  and 
Hunter  (8]  in  an  abbreviated  form,  is  provided  only  for  the  purpose  of  gaining  some 
preliminary  ideas  of  the  behavior  of  the  procedures  we  have  discussed. 


2When  the  criterion  to  stop  the  iterative  process  is  made  too  stringent,  in  the  extreme 
neighborhood  of  the  minimum  both  the  numerator  and  the  denoeiinator  of  (6.3)  and  (6.7)  may 
become  too  small  for  the  particular  arithmetic  precision  being  used.  A  precautionary 
provision  that  checks  this  is  desirable  to  avoid  unnecessary  use  of  resources. 
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The  Modal  la  f(C,fl)  ■  *  91^1  +  500052>  and  tha  obaarvationa  (C^Cj.y) 

ara  (1,1,0.1165),  (2,1,0.2114),  (1,2,0.0684)  and  (2,2,0.1159).  Tha  sum  of  squarea 

aorfaea,  plottad  in  Figure  6,  la  vary  curvad  and  ridgy.  The  valuea  ( © ^ , 6 2 >  “  (300,6) 

corresponding  to  PQ  in  tha  figura  ara  tha  choaan  initial  aatiaataa. 

Figure  7  shows  tha  raault  for  fiva  diffarant  aathoda.  It  shows  the  number  of  tiaea, 

nt,  that  f(C,8)  haa  to  be  evaluated  before  tha  iterative  proceaa  reachea  the  point  of 

minimum  Pm  (at  trtiich  tha  bum  of  aquaraa  "  3.82750  »  10~5).  For  the  two  aathoda  that  uae 

tha  (X,v)  algorithm  n.  ia  ahown  for  varioue  choicaa  of  X  and  v,  while  that  for 

‘  o 

tha  nethoda  which  do  not  depend  on  any  control  peraaetera  ia  indicated  by  a  horizontal 
line. 

Conaidering  that  tha  aathoda  with  tha  (X,v)  algorithm  require  a  auitabla  choice  of 

X  and  v  in  advance,  tha  behavior  of  tha  modified  Gauaa  aathoda  with  aodificationa 
o 

A  and  B  ia  certainly  not  diacouraging.  It  will  alao  be  intereeting  that  tha  modified 

Gauaa  method  with  tha  (X,v)  algorithm  appeara  a tab la  over  a  wide  range  of  X  ,  if  wa 

o 

recall  that  tha  awdifiad  Gauaa  aolution  ia  tha  raault  of  tha  conatrained  minimization  in 
the  apaca  which  requirea  no  interpolation  among  the  two  claaaical  approachea  that  often 
contradict  each  other. 

8.  Oomelasioa 

Tha  La vanberg-Karquardt ' a  conatrained  iteration  haa  been  widely  programmed  and 
uaad.  One  apparent  juatification  ia  that  it  providea  a  compromize  between  the  ateepeat 
daacent  method  and  the  Gauaa  method.  However,  if  tha  parameters  ara  transformed  into  the 
linearly  invariant  metric,  tha  ateepeat  daacent  vector  and  tha  Gauaa  aolution  vector  are 
found  to  be  identical  and  thua  there  ia  no  need  to  compromise  between  directions  given  by 
these  two  vac tor a.  It  ia  also  shown  that  the  conatrained  minimisation  in  tha  linearly 
invariant  metric  ia  equivalent'  to  using  in  tha  original  metric  tha  modified  Gauaa  method 
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Figure  7.  Comparison  of  five  different  procedures  in  the  example 


which  was  proposed  earlier.  Two  methods  are  proposed  to  determine  how  far  one  should  go 
along  the  Gauss  vector.  These  both  have  the  virtue  that  their  computation  only  involves 
intonation  that  already  exists. 


Ne  wish  to  thank  Professors  William  Hunter  and  Norman  Draper  of  the  University  of 
Wisconsin  for  their  encouragement  and  comments  on  this  work. 
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